Tree Topological Features for Unlexicalized Parsing (Coling 2010)
نویسندگان
چکیده
As unlexicalized parsing lacks word token information, it is important to investigate novel parsing features to improve the accuracy. This paper studies a set of tree topological (TT) features. They quantitatively describe the tree shape dominated by each non-terminal node. The features are useful in capturing linguistic notions such as grammatical weight and syntactic branching, which are factors important to syntactic processing but overlooked in the parsing literature. By using an ensemble classifierbased model, TT features can significantly improve the parsing accuracy of our unlexicalized parser. Further, the ease of estimating TT feature values makes them easy to be incorporated into virtually any mainstream parsers.
منابع مشابه
Tree Topological Features for Unlexicalized Parsing
As unlexicalized parsing lacks word token information, it is important to investigate novel parsing features to improve the accuracy. This paper studies a set of tree topological (TT) features. They quantitatively describe the tree shape dominated by each non-terminal node. The features are useful in capturing linguistic notions such as grammatical weight and syntactic branching, which are fact...
متن کاملSentence Realization with Unlexicalized Tree Linearization Grammars
Sentence realization, as one of the important components in natural language generation, has taken a statistical swing in recent years. While most previous approaches make heavy usage of lexical information in terms of N -gram language models, we propose a novel method based on unlexicalized tree linearization grammars. We formally define the grammar representation and demonstrate learning from...
متن کاملLexicalization of Probabilistic Grammars
Two general methods for the lexicalization of probabilistic grammars are presented which are modular, powerful and require only a small number of parameters. The rst method multiplies the unlexicalized parse tree probability with the exponential of the mutual information terms of all word-governor pairs in the parse. The second lexicalization method accounts for the dependencies between the dii...
متن کاملThree-Dimensional Parametrization for Parsing Morphologically Rich Languages
Current parameters of accurate unlexicalized parsers based on Probabilistic ContextFree Grammars (PCFGs) form a twodimensional grid in which rewrite events are conditioned on both horizontal (headoutward) and vertical (parental) histories. In Semitic languages, where arguments may move around rather freely and phrasestructures are often shallow, there are additional morphological factors that g...
متن کامل2D Trie for Fast Parsing
In practical applications, decoding speed is very important. Modern structured learning technique adopts template based method to extract millions of features. Complicated templates bring about abundant features which lead to higher accuracy but more feature extraction time. We propose Two Dimensional Trie (2D Trie), a novel efficient feature indexing structure which takes advantage of relation...
متن کامل